Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet. However, existing models focus only on optimizing the documentation code pairs by embedding them into latent space, without the association of external knowledge. In this paper, we propose a generation-augmented query expansion framework. Inspired by the human retrieval process - sketching an answer before searching, in this work, we utilize the powerful code generation model to benefit the code retrieval task. Specifically, we demonstrate that rather than merely retrieving the target code snippet according to the documentation query, it would be helpful to augment the documentation query with its generation counterpart - generated code snippets from the code generation model. To the best of our knowledge, this is the first attempt that leverages the code generation model to enhance the code retrieval task. We achieve new state-of-the-art results on the CodeSearchNet benchmark and surpass the baselines significantly.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
We present a novel method to provide efficient and highly detailed reconstructions. Inspired by wavelets, our main idea is to learn a neural field that decompose the signal both spatially and frequency-wise. We follow the recent grid-based paradigm for spatial decomposition, but unlike existing work, encourage specific frequencies to be stored in each grid via Fourier features encodings. We then apply a multi-layer perceptron with sine activations, taking these Fourier encoded features in at appropriate layers so that higher-frequency components are accumulated on top of lower-frequency components sequentially, which we sum up to form the final output. We demonstrate that our method outperforms the state of the art regarding model compactness and efficiency on multiple tasks: 2D image fitting, 3D shape reconstruction, and neural radiance fields.
translated by 谷歌翻译
Multimodal named entity recognition (MNER) and multimodal relation extraction (MRE) are two fundamental subtasks in the multimodal knowledge graph construction task. However, the existing methods usually handle two tasks independently, which ignores the bidirectional interaction between them. This paper is the first to propose jointly performing MNER and MRE as a joint multimodal entity-relation extraction task (JMERE). Besides, the current MNER and MRE models only consider aligning the visual objects with textual entities in visual and textual graphs but ignore the entity-entity relationships and object-object relationships. To address the above challenges, we propose an edge-enhanced graph alignment network and a word-pair relation tagging (EEGA) for JMERE task. Specifically, we first design a word-pair relation tagging to exploit the bidirectional interaction between MNER and MRE and avoid the error propagation. Then, we propose an edge-enhanced graph alignment network to enhance the JMERE task by aligning nodes and edges in the cross-graph. Compared with previous methods, the proposed method can leverage the edge information to auxiliary alignment between objects and entities and find the correlations between entity-entity relationships and object-object relationships. Experiments are conducted to show the effectiveness of our model.
translated by 谷歌翻译
This paper is about an extraordinary phenomenon. Suppose we don't use any low-light images as training data, can we enhance a low-light image by deep learning? Obviously, current methods cannot do this, since deep neural networks require to train their scads of parameters using copious amounts of training data, especially task-related data. In this paper, we show that in the context of fundamental deep learning, it is possible to enhance a low-light image without any task-related training data. Technically, we propose a new, magical, effective and efficient method, termed \underline{Noi}se \underline{SE}lf-\underline{R}egression (NoiSER), which learns a gray-world mapping from Gaussian distribution for low-light image enhancement (LLIE). Specifically, a self-regression model is built as a carrier to learn a gray-world mapping during training, which is performed by simply iteratively feeding random noise. During inference, a low-light image is directly fed into the learned mapping to yield a normal-light one. Extensive experiments show that our NoiSER is highly competitive to current task-related data based LLIE models in terms of quantitative and visual results, while outperforming them in terms of the number of parameters, training time and inference speed. With only about 1K parameters, NoiSER realizes about 1 minute for training and 1.2 ms for inference with 600$\times$400 resolution on RTX 2080 Ti. Besides, NoiSER has an inborn automated exposure suppression capability and can automatically adjust too bright or too dark, without additional manipulations.
translated by 谷歌翻译
Recently, discrete latent variable models have received a surge of interest in both Natural Language Processing (NLP) and Computer Vision (CV), attributed to their comparable performance to the continuous counterparts in representation learning, while being more interpretable in their predictions. In this paper, we develop a topic-informed discrete latent variable model for semantic textual similarity, which learns a shared latent space for sentence-pair representation via vector quantization. Compared with previous models limited to local semantic contexts, our model can explore richer semantic information via topic modeling. We further boost the performance of semantic similarity by injecting the quantized representation into a transformer-based language model with a well-designed semantic-driven attention mechanism. We demonstrate, through extensive experiments across various English language datasets, that our model is able to surpass several strong neural baselines in semantic textual similarity tasks.
translated by 谷歌翻译
我们提出了一种新型的交通轨迹编辑方法,该方法使用时空钥匙帧在模拟过程中控制车辆以生成所需的交通轨迹。通过考虑自我动机,遵循和避免碰撞的路径,提出的基于力的交通模拟框架更新了FRENET坐标和笛卡尔坐标中车辆的动作。使用用户的路点,可以通过参考路径计划生成车道级导航。使用给定的钥匙帧,提出了粗到1的优化,以有效地生成可满足时空约束的合理轨迹。首先,沿参考路径构建的有向状态图用于通过将密钥帧映射为目标来搜索粗粒轨迹。然后,使用从粗轨迹作为初始化提取的信息,基于基于伴随的优化来生成基于我们基于力的仿真的平滑运动的较好轨迹。我们通过广泛的实验来验证我们的方法。
translated by 谷歌翻译
现有的广告点击率(CTR)预测模型主要取决于行为ID功能,这些功能是根据历史用户AD交互所学习的。然而,依赖历史用户行为的行为ID功能是不可行的,可以在没有以前与用户互动的情况下描述新广告。为了克服对新广告建模的行为ID特征的局限性,我们利用广告中的视觉内容来提高CTR预测模型的性能。具体来说,我们根据其视觉内容将每个广告映射到一组视觉ID中。这些视觉ID进一步用于生成可视觉嵌入,以增强CTR预测模型。我们将视觉ID的学习分为有监督的量化问题。由于缺乏广告中商业图像的类标签,因此我们利用图像文本描述作为监督,以优化图像提取器以生成有效的视觉ID。同时,由于硬量化是不可差异的,因此我们软化量化操作以使其支持端到端网络培训。将每个图像映射到视觉ID之后,我们根据过去积累的历史用户AD交互学习每个视觉ID的嵌入。由于视觉ID嵌入仅取决于视觉内容,因此它概括为新广告。同时,嵌入视觉ID补充了AD行为ID嵌入。因此,它可以大大提高CTR预测模型的性能,以前依赖于积累了丰富用户行为的新广告和广告的行为ID功能。将视觉ID嵌入在BAIDU在线广告的CTR预测模型中后,AD的平均CTR提高了1.46%,总费用增加了1.10%。
translated by 谷歌翻译
在无监督的域适应性(UDA)中,直接从源到目标域的适应通常会遭受明显的差异,并导致对齐不足。因此,许多UDA的作品试图通过各种中间空间逐渐和轻柔地消失域间隙,这些空间被称为域桥接(DB)。但是,对于诸如域自适应语义分割(DASS)之类的密集预测任务,现有的解决方案主要依赖于粗糙的样式转移以及如何优雅地桥接域的优雅桥梁。在这项工作中,我们诉诸于数据混合以建立用于DASS的经过经过经过经过讨论的域桥接(DDB),通过该域的源和目标域的联合分布与中间空间中的每个分布进行对齐并与每个分布。 DDB的核心是双路径域桥接步骤,用于使用粗糙和精细的数据混合技术生成两个中间域,以及一个跨路径知识蒸馏步骤,用于对两个互补模型进行对生成的中间样品进行培训的互补模型作为“老师”以多教老师的蒸馏方式发展出色的“学生”。这两个优化步骤以交替的方式工作,并相互加强以具有强大的适应能力引起DDB。对具有不同设置的自适应分割任务进行的广泛实验表明,我们的DDB显着优于最先进的方法。代码可从https://github.com/xiaoachen98/ddb.git获得。
translated by 谷歌翻译
在混合完成的多任务,多域和多模式数据上进行预训练仍然是视力感知预训练的开放挑战。在本文中,我们提出了GPPF,这是一个普遍的感知预训练框架,预先培训任务级的动态网络,该网络是由在标签的多任务和多域数据集上的各层知识“乐高”组成的。通过检查人类在复杂环境中学习的先天能力,我们识别并将三个关键要素转移到深网上:(1)同时暴露于每个批次中的各种交叉任务和跨域信息。 (2)由知识共享驱动的单独的乐高单元中的分区知识存储。 (3)用于训练和下游任务的乐高单元子集的稀疏激活。值得注意的是,由于其在输入形状,损失功能,输出格式,数据分布等方面的差异,不同视觉任务的联合培训是不平凡的。因此,我们创新地开发了插件的多任务培训算法,该培训算法是支持单个迭代多个任务(SIMT)同时培训。 Simt用大型多任务多任务数据集为预训练的基础奠定了基础,并且被证明对于我们的GPPF实验中的稳定培训至关重要。令人兴奋的是,详尽的实验表明,我们的GPPF-R50型号在GPPF-15M中的8个预训练预培训任务的强大基线上取得了显着改善,并在22个下游任务中收获了一系列SOTA,并具有相似的计算预算。我们还验证了GPPF对SOTA视觉变压器的概括能力,并具有一致的改进。这些可靠的实验结果充分证明了我们新颖的GPPF框架提供的有效的知识学习,存储,共享和转移。
translated by 谷歌翻译